Adventures in Palmer Penguins Data Set

Antarctic Discovery for Google Analytics Capstone

Step 1 | Ask

The researchers at Palmer Station Antarctica have set out to collect data about 3 major Antarctic penguin species; adelie, chinstrap and gentoo. The researchers have collected various quantitatie measurements about the penguins including flipper length, bill depth and bill length among other data points such a sex and island.

The researchers would like to obtain some insight from this data. They have tasked you, the reader, with gaining the following insight about the palmer penguin dataset:

  1. Graphically represent the relationship between body mass and flipper length by sex and species.
  2. Graphically represent the relationship between bill depth and bill lengthe by sex and species.

Step 2 & 3| Prepare and Process for Analysis:

The Palmer penguin data set has several columns, but for our analysis we’ll only need certain columns. First we’ll load the libraries we’ll be using to explore the data from the tidyverse and the dataset palmerpenguins.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'bslib'
## 
## 
## The following object is masked from 'package:utils':
## 
##     page
## Warning: package 'plotly' was built under R version 4.3.2
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Explore the Date and clean

## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           36.7          19.3               193        3450
## 5 Adelie  Torgersen           39.3          20.6               190        3650
## 6 Adelie  Torgersen           38.9          17.8               181        3625
## # ℹ 2 more variables: sex <fct>, year <int>
## # A tibble: 6 × 8
##   species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
## 1 Chinstrap Dream            45.7          17                 195        3650
## 2 Chinstrap Dream            55.8          19.8               207        4000
## 3 Chinstrap Dream            43.5          18.1               202        3400
## 4 Chinstrap Dream            49.6          18.2               193        3775
## 5 Chinstrap Dream            50.8          19                 210        4100
## 6 Chinstrap Dream            50.2          18.7               198        3775
## # ℹ 2 more variables: sex <fct>, year <int>
## Rows: 333
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 17.6, 21.2…
## $ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…
## $ body_mass_g       <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…
## $ sex               <fct> male, female, female, female, male, female, male, fe…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
## [1] "species"           "island"            "bill_length_mm"   
## [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
## [7] "sex"               "year"

Confirming that we do not see any errors or potential complications to our data that require cleaning, we can select the columns we would like to use for our analysis:

## Rows: 333
## Columns: 6
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 41.1, 38.6…
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 17.6, 21.2…
## $ flipper_length_mm <int> 181, 186, 195, 193, 190, 181, 195, 182, 191, 198, 18…
## $ body_mass_g       <int> 3750, 3800, 3250, 3450, 3650, 3625, 4675, 3200, 3800…
## $ sex               <fct> male, female, female, female, male, female, male, fe…

Step 4 | Analyze

Now, we will analyze the data through graphical representation. We’ll use ggplot2 to accomplish our goals.

Question 1 | Flipper length as a relation to Body Mass by Species and Sex

Observations:

From the data we can see that there is a correlation between the overall mass of the penguin and flipper length acrross species. Furthermore, males on average show longer flipper length to total body mass. The summary of these observations can be seen in the table below:

Males

species No. Inviduals Average Flipper Length (mm)
Adelie 73 192.4110
Chinstrap 34 199.9118
Gentoo 61 221.5410

Females

species No. Inviduals Average Flipper Length (mm)
Adelie 73 187.7945
Chinstrap 34 191.7353
Gentoo 58 212.7069

Question 2 | Bill depth as a relation to bill length by species and sex

Observations:

Bill depth between species was most similar between adelie and chinstrap penguins. There is an on average graeter bill depth to beak length among males in all species. Males also have the larger and deeper beaks between sexes as well. These observations are also summarized in teh table below:

Males

species No. Individuals Average Bill Depth (mm) Average Bill Length (mm)
Adelie 73 19.07260 40.39041
Chinstrap 34 19.25294 51.09412
Gentoo 61 15.71803 49.47377

Females

species No. Individuals Average Bill Depth (mm) Average Bill Length (mm)
Adelie 73 17.62192 37.25753
Chinstrap 34 17.58824 46.57353
Gentoo 58 14.23793 45.56379

Step 5 | Share

This has been made available on GitHub for viewers to see, make comments on and see the R scripts used to analyse the data. This data is also public, and I expect many similar reports exist among the internet. However, Palmer penguins data is exceptionally fun to play with as I have a special interest in Antarctica myself. There will be no final phase (step 6) to ‘act’ on the data.

Appendix

Citations:

citation(package = "palmerpenguins")
## To cite palmerpenguins in publications use:
## 
##   Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer
##   Archipelago (Antarctica) penguin data. R package version 0.1.0.
##   https://allisonhorst.github.io/palmerpenguins/. doi:
##   10.5281/zenodo.3960218.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {palmerpenguins: Palmer Archipelago (Antarctica) penguin data},
##     author = {Allison Marie Horst and Alison Presmanes Hill and Kristen B Gorman},
##     year = {2020},
##     note = {R package version 0.1.0},
##     doi = {10.5281/zenodo.3960218},
##     url = {https://allisonhorst.github.io/palmerpenguins/},
##   }
citation(package = 'bslib')
## To cite package 'bslib' in publications use:
## 
##   Sievert C, Cheng J, Aden-Buie G (2023). _bslib: Custom 'Bootstrap'
##   'Sass' Themes for 'shiny' and 'rmarkdown'_. R package version 0.5.0,
##   <https://CRAN.R-project.org/package=bslib>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {bslib: Custom 'Bootstrap' 'Sass' Themes for 'shiny' and 'rmarkdown'},
##     author = {Carson Sievert and Joe Cheng and Garrick Aden-Buie},
##     year = {2023},
##     note = {R package version 0.5.0},
##     url = {https://CRAN.R-project.org/package=bslib},
##   }
citation(package = 'tidyverse')
## To cite package 'tidyverse' in publications use:
## 
##   Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
##   Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller
##   E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V,
##   Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to
##   the tidyverse." _Journal of Open Source Software_, *4*(43), 1686.
##   doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Welcome to the {tidyverse}},
##     author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
##     year = {2019},
##     journal = {Journal of Open Source Software},
##     volume = {4},
##     number = {43},
##     pages = {1686},
##     doi = {10.21105/joss.01686},
##   }

Unbroken Code Chunk: